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During the last years, low-rank tensor approximation has been established as a new tool in scientific comput- 
ing to address large-scale linear and multilinear algebra problems, which would be intractable by classical 
techniques. This survey attempts to give a literature overview of current developments in this area, with an 
emphasis on function-related tensors. 



1 Introduction 

This survey is concerned with tensors in the sense of multidimensional arrays. A general tensor of order d 
and size n\ x n 2 x • ■ ■ x for integers n\, n 2 , ■ ■ ■ , will be denoted by 

X £ gniX^X-xnj 

An entry of X is denoted by Xi ± i d where each index 6 {1, . . . , n M } refers to the /ith mode of the 
tensor for fi = 1, . . . , d. For simplicity, we will assume that X has real entries, but it is of course possible 
to define complex tensors or, more generally, tensors over arbitrary fields. 

A wide variety of applications lead to problems where the data or the desired solution can be represented 
by a tensor. In this survey, we will focus on tensors that are induced by the discretization of a multivariate 
function; we refer to the survey [168] and to the books [174 , 239] for the treatment of tensors contain- 



ing observed data. The simplest way a given multivariate function f(x\, X2, ■ ■ ■ , Xd) on a tensor product 
domain £1 = [0, l] d leads to a tensor is by sampling / on a tensor grid. In this case, each entry of the 
tensor contains the function value at the corresponding position in the grid. The function / itself may, for 
example, represent the solution to a high-dimensional partial differential equation (PDE). 

As the order d increases, the number of entries in X increases exponentially for constant n = n\ = 
■ ■ ■ = rid- This so called curse of dimensionality prevents the explicit storage of the entries except for very 
small values of d. Even for n = 2, storing a tensor of order d = 50 would require 9 petabyte! It is therefore 
essential to approximate tensors of higher order in a compressed scheme, for example, a low-rank tensor 
decomposition. Various such decompositions have been developed, see Section[2] An important difference 
to tensors containing observed data, a tensor X induced by a function is usually not given directly but only 
as the solution of some algebraic equation, e.g., a linear system or eigenvalue problem. This requires the 
development of solvers for such equations working within the compressed storage scheme. Such algorithms 
are discussed in Section[3] 

The range of applications of low-rank tensor techniques is quickly expanding. For example, they have 
been used for addressing: 



the approximation of parameter-dependent integrals 1 15 164, 190|, multi-dimensional integrals [48 
|112| [153|, and multi-dimensional convolution ]152| ; 
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• computational tasks in electronic structure calculations, e.g., based on Hartree-Fock or DFT mod- 
els !^|^|?T1|^|^|?71|571 [nT|fT^[T3ni[T3glfT391 ; 



the solution of stochastic and parametric PDEs [ 68 69 ,[76l [77l[TM1[T6^[T7T|[T89l|202| ; 



approximation of Green's functions of high-dimensional PDEs 1 30 1 13 1; 



• the solution of the Boltzmann equation 1 129 149], and the chemical master / Fokker-Planck equa- 
tions |4l[^[M1Pl|8^ [T22l[T^[r331[T96l 



• the solution of high-dimensional Schrodinger equations [46, 160, 172, 184, 192, 193 1; 

• computational tasks in micromagnetics j80l[8T[ ; 

• rational approximation problems 1 242| ; 

• computational homogenization [42,96 176); 

• computational finance 1 134 137| ; 

• the approximation of stationary states of stochastic automata networks [38 1; 

• multivariate regression and machine learning (25). 

Note that the list above mainly focuses on techniques that involve tensors; low-rank matrix techniques, 
such as POD and reduced basis methods, have been applied in an even broader setting. 

A word of caution is appropriate. Even though the field of low-rank tensor approximation is relatively 
young, it already appears a daunting task to give proper credit to all developments in this area. This survey 
is biased towards work related to the TT and hierarchical Tucker decompositions. Surveys with a similar 
scope are the lecture notes by Grasedyck p02[ , Khoromskij 1 147|153|155) , and Schneider [233 1, as well as 
the monograph by Hackbusch [ 108 1. Less detailed attention may be given to other developments, although 
we have made an effort to at least touch on all important directions in the area of function-related tensors. 



2 Low-rank tensor decompositions 

As mentioned in the introduction, it will rarely be possible to store all entries of a higher-order tensor 
explicitly. Various compression schemes have been developed to reduce storage requirements. For d = 2 all 



these schemes boil down to the well known reduced singular value decomposition (SVD) of matrices [97 1; 
however, they differ significantly for tensors of order d > 3. 



2.1 CP decomposition 

The entries of a rank-one tensor X can be written as 



(1) (2) 



.(<*) 



.d. 



(1) 



By defining the vectors u 



in) , = c>) 



, ,u 



, a more compact from of this relation is given by 



vec{X) 



= „(«0 



u 



(d-i) 



(i) 



where ® denotes the usual Kronecker product and vec stacks the entries of a tensor into a long column 
vector, such that the indices are in reverse lexicographical order. (Using the vector outer product o, this 

(!) o u' 2 ) o • • • o uW.) When X represents the discretization of a separable 



relation takes the form X 

function f(xi,X2, ■ • ■ , Xd) = fi{x\) $2(3:2) ■ ■ ■ fd(xd) then X is a rank-one tensor with each vector u 1 -^ 
corresponding to a discretization of / M . 

The CP (CANDECOMP/PARAFAC) decomposition is a sum of rank-one tensors: 



vec(A') 



<8> u \ 



(d-i) 



id) 



® U 



(d-1) 
R 



(2) 
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The tensor rank of X is defined as the minimal R such that X has a CP decomposition with R terms. Note 
that, in contrast to matrices, the set CP{R) of tensors of rank at most 7? is in general not closed, which 
renders the problem of finding a best low-rank approximation ill-posed |56). For more properties of the CP 
decomposition, we refer to the survey paper 1 168 1. 

The CP decomposition requires the storage of [ni + n-2 + ■ ■ ■ + n d )R entries, which becomes very 
attractive for small R. To be able to use the CP decomposition for the approximation of function-related 
tensors, robust and efficient compression techniques are essential. In particular, the truncation of a rank-i? 
tensor to lower tensor rank is frequently needed. Nearly all existing algorithms are based on carefully 
adapting existing optimization algorithms, see, once again, [168] for an overview of the literature until 
around 2009. More recent developments for general tensors include work on increasing the efficiency and 
robustness of gradient-based and Newton-like methods [2,72,74, 140,223,224|, modifying and improving 
ALS (alternating least squares) gl][57][58)[90)[225], studying the convergence of ALSj5T][T94][Bj| and 



reducing the cost of the unfolding operations required during the approximation [222 269]. 

One can impose additional structure on the coefficients of the CP decomposition, such as nonnegativity. 
As this is of primary interest in data analysis applications, a comprehensive discussion is beyond the scope 
of this survey, see |168| . 



2.2 Tucker decomposition 

A Tucker decomposition of a tensor X takes the form 

vec(Af) = (U d ® U d -i ®---®Ux) vec(C), (3) 

where Ui, U2, ■ ■ ■ , U d , with £ R n * 4X '>, are called the factor matrices or basis matrices and C G 
jjrix - xr d j s ca jj e( j the core tensor of the decomposition. 

Like CP, the Tucker decomposition has a long history and we refer to the survey [ 168 1 for a more detailed 



account. In the following, we briefly summarize the basic techniques, which are needed to motivate the TT 
and HT decompositions discussed below. 

The Tucker decomposition is closely related to the matricizations of X. The /ith matricization X^> is 
an rip x (ni • • • n^in^i ■ ■ ■ n d ) matrix formed in a specific way from the entries of X: 

:= -V, ,.. I = 1 + YJ$» ~ !) II n v + " !) II n "- 

In particular, the relation Q implies 

= U„ ■ C M ■ (U d ® • • • ® U„ +l ® ® • • • ® C/i) T , n = l,...,d. (4) 

It follows that rank(X f^)) < r M , as the first factor £ R"f XI > obviously has rank at most r M . This 
motivates to define the multilinear rank (also called /i-rank) of a tensor X as the tuple 

(ri,...,r d ), with r M =rank(X (Al) ). 

In contrast to the tensor rank related to the CP decomposition, the set T(ri , . . . , rd) of tensors of ^-rank at 
most r p is closed. 



Another consequence of the relation Q is the higher-order SVD (HOSVD) introduced in [54 55) for 



approximating a tensor by a Tucker decomposition ([3]) of lower multilinear rank. In HOSVD, the columns 
of each factor matrix are computed as the dominant left singular vectors of jW. The core tensor is 

then obtained by forming vec(C) := (U d <g> • • • <£> Ui) T \ec(X). Eventually, this yields 

vec(A ; ) := (U d ® ■ ■ ■ <g> U x ) ■ vec(C) € T(k u . . . , k d ). 
In contrast to the matrix case, where the SVD yields a best low -rank approximation for all unitarily invariant 



norms 1 125 Sec. 7.4.9], the truncated tensor X resulting from the HOSVD is usually not optimal. However, 
we have 

\\X-X\\<yfd mm \\X-y\\. 
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(iv) 



(v) 



(i) 
(ii) 
(iii) 




Fig. 1 Tensor network diagrams representing (i) a vector, (ii) a matrix, (iii) a matrix-matrix multiplication, (iv) a 
tensor in Tucker decomposition, and (v) a tensor in TT decomposition. 



This quasi-optimality condition is usually sufficient for the purpose of obtaining an accurate approximation 
to a function-related tensor. 

Various alternatives to improve on the approximation provided by the HOSVD have been developed, 
see 11168 



131 132 



and the references therein. Recent developments include Newton-type methods on manifolds fTT] 
229 1, a Jacobi algorithm for symmetric tensors [ 130], generalizations of Rrylov subspace meth- 



ods 98 228 1, and modifications of the HOSVD 1259). 



2.3 Tensor train decomposition 

The need for storing the r*i X • • • X r<j core tensor C renders the Tucker decomposition increasingly unattrac- 
tive as d gets larger. This has motivated the search for decompositions which potentially avoid these ex- 
ponentially growing memory requirements, while still featuring the two most important advantages of the 
Tucker decomposition: closedness and SVD-based compression. 

One well established candidate for such a decomposition is the so called TT (tensor train) decomposi- 
tion, which takes the form 

X il ,..., id = G 1 {i l )-G 2 {i 2 )---G d {id), G,(v)eK r -' x ^, (5) 

where r = = 1. For every mode and every index the coefficients G^i^) are matrices. In 
the context of numerical analysis, a decomposition of the form |5]l was first proposed in |212 213 217| . 



However, such a decomposition has been proposed earlier in the density-matrix renormalization group 
method (DMRG) for simulating quantum systems [235 , 266 1. In this area, the term matrix product state 
(MPS) representation for the decomposition |5]) has been established [218]. Suitable conditions that imply 
a unique MPS representation can be found in |221 263) . The connection between TT and MPS has been 



explained in 1 127) 



Similar to the Tucker decomposition, the TT decomposition is closely related to certain matriciza- 
tions of X. Let X^ 1 '---'^ denote the matrix obtained by reshaping the entries of X into an (nx ■ ■ ■ n^) x 
(n^+i • • • rid) array, such that |5) implies rank(X( 1,- " ,Al )) < r p for fi = 1, . . . , d. Consequently, the tuple 
containing the ranks of these matricizations is called the TT-rank of X. As explained, e.g., in |217[ a quasi- 
best approximation in a TT decomposition for a given TT-rank can be obtained from the SVDs of X^ 1 ''"'^, 
similarly to the HOSVD. It is important to avoid the explicit construction of these matrices and the SVDs 
when truncating a tensor in TT decomposition to lower TT-rank. Such truncation algorithms are described 
in | |217| . On the theoretical side, it turns out that the set TT(ri, . . . , rd-i) of tensors with TT-ranks 
bounded by r M is closed, and under a full rank condition it actually forms a smooth manifold 1 123 255 1. 



The Kahler manifold structure for complex MPS with open and periodic boundary conditions has been 
studied in |119| . 

Tensor network diagrams, which have been attributed to Penrose ]220) , are helpful in visualizing tensor 
decompositions and their manipulation. Figure [T] gives a few basic examples, see, e.g., 1 124 127 173) 



for more details. In particular, Figure [T](v) gives an illustration of the contraction |5]) representing a TT 
decomposition. In view of this diagram, the TT decomposition is also sometimes called linear tensor 
network [91 1. 
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{1} {2} {3} {4} 



Fig. 2 Left: Binary tree representing mode splitting for HT decomposition. Right: Tensor network diagram repre- 
senting a tensor in HT decomposition. 



In applications related to quantum spin systems, the tensor X often exhibits symmetries inherited from 
underlying physical properties. There are variants of MPS/TT that reflect such symmetries in the low-rank 
decomposition, see [ 128 221 227 1 and the references therein. 



2.4 Hierarchical Tucker decomposition 

An alternative way to reduce the complexity of the Tucker decomposition is given by the hierarchical 
Tucker (HT) decomposition [ 103, 1 17] (also called hierarchical tensor representation). This decomposition 
is based on the idea of recursively splitting the modes of the tensor, which results in a binary tree T 
containing a subset t C {1, . . . , d} at each node. An example of such a binary tree is given in the left 
plot of Figure|2] The matricization X® of a tensor X corresponding to such a subset t merges all modes 
contained in t into row indices of the matrix, and the other modes into column indices. We then consider 
a hierarchy of matrices U t whose columns span the image of X^' for each t 6 T. Hence, U t has exactly 
r t = rank(X(')) columns. The rank tuple (r t ) te j- is called the HT-rank of X. 

The following nestedness property allows for the implicit storage of (U t )teT^ an d tnus °f me tensor X: 
For t = ti U t r , t[ <~)t r — 0, there exists a matrix B t such that 

U t = {U tr ®U tl )B t , B t eR rt ' r ^ xrt . (6) 

For simplicity, we have assumed that the ordering of the modes in the tree T is such that all modes contained 
in ti are smaller than the modes contained in t r . The relation |6]) implies that it suffices to store the basis 
matrices Ut only for the leaf nodes t = {1}, {2}, . . . , {d}, and B t for all other nodes in T. The resulting 
storage requirements are 0(dnr + dr 3 ), when assuming r = r t and n = n^. 

Similarly to the Tucker and TT decompositions, a quasi-best approximation in the HT decomposition for 
a given HT-rank can be obtained from the SVDs of X^\ Algorithms that avoid the explicit computation of 
these SVDs when truncating a tensor that is already in HT decomposition are discussed in |103||117||173 
. As for the TT decomposition, the set of tensors having fixed HT-rank forms a smooth manifold [83 



175 



254 



255|. 



The tensor network corresponding to the HT decomposition is always a binary tree, see also the right 
plot of Figure [2] Such tensor tree networks had already been discussed in [ 238 1 (without the basis matrices 
at the leafs). Moreover, the so called multilayer multi-configuration time-dependent Hartree method (ML- 



MCTDH) introduced in [265 ] makes use of a decomposition based on general trees instead of binary trees. 
When allowing for general trees, tensor tree networks include the Tucker decomposition from Section |2T2 
as a (quite particular) special case. In the case of a degenerate tree, where at each level, one mode is split 
from the remaining modes, the HT decomposition becomes equivalent to a variant of the TT decomposition 
discussed in [62 217|. In contrast to the TT decomposition defined in ([5j, this variant features additional 



basis matrices, which may reduce the storage cost for large n^. A discussion on the difference between the 
ranks for the HT and TT decompositions can be found in |105| . 
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2.5 More general tensor network formats 

Motivated by an underlying topology describing interactions, tensor networks beyond trees have been 
considered in the context of renormalization group methods for simulating strongly correlated quantum 
spin systems. Well-known examples include the so called projected entangled-pair states (PEPS) [260, 
|261| and the multiscale entanglement renormalization ansatz (MERA) ]264[ . Both, PEPS and MERA 
contain cycles in the tensor network. Tree-structured tensor networks, as the hierarchical Tucker and the 
TT format, are closed 1 75 82 108) in the sense that tensors with ranks at most r M form a closed set in 
R" lX x ™ d . In general, this statement does not hold for tensor networks containing cycles 1 177 178 1 . 
Possibly for this reason, more general networks have not yet been considered to a large extend in the 
numerical analysis community for, e.g., the solution of high-dimensional PDEs, but see f75]|121[ for some 
recent mathematically oriented work. 



2.6 Hybrid formats 

Adding to the diversity of the formats discussed above, it is possible and sometimes useful to combine 
different low-rank formats. One popular combination is the Tucker format combined with the CP format 



for the approximation of the core tensor 1 148 156 157 170 1, see [168 Sec. 5.7] for other variations of 
Tucker and CP. In [93 110[|111||115| , combinations of low -rank tensor formats with hierarchical matrices 
are investigated. 



2.7 A priori approximation results 

As an essential prerequisite for the success of tensor-based computations, it is important to decide whether 
a tensor generated by a certain multivariate function f(x\ , . . . , Xd) can be well approximated by a low-rank 
tensor decomposition. As discussed in Section |2~Tj the tensor rank is closely linked to approximating / by 
a sum of separable functions. Only in exceptional cases, it will be possible to represent / exactly by such 
a sum, see ||26] [T95][208l . 

In general, one is therefore interested in an approximation of the form 



f(xi, ...,x d ) 



R 

E 



f(2) 



(7) 



For a function of the form f(x%, . . . , Xd) = g(x\ + ■ ■ ■ + xj), such an approximation can be immediately 
obtained from approximating g by a sum of exponentials. For this purpose, various approaches have 
been discussed in |27}|29|[36]|37j. Other techniques include applying numerical quadrature to an integral 
representation of the function, see, e.g., |24 77 87 110 115| , Taylor series expansion [249,2501, and 
polynomial interpolation 1 19 33). Based on results by Temlyakov 1 243 - 245 1 on bilinear approximation 
rates, singular value estimates for the hierarchical Tucker decomposition of functions in mixed Sobolev 
spaces have been obtained in | 234) , see also 1 106) . In fact, a sparse grid approximation to / can be 
turned quite effectively into a low-rank tensor decomposition [ 108 1 17) . General nonlinear best i?-term 
approximation schemes |49 50 171 1 represent another important technique, which we cannot cover in 
detail. 

Even for smooth /, it may not always be possible to attain sufficiently low ranks, especially when the 
variation of / is too strong across its entire domain of definition. In this case, it can be advantageous to 
subdivide the domain and approximate / on each subdomain separately with a low-rank tensor decompo- 
sition, see 1 11 14 T3J for examples. As first discussed in [26|, an approximation of the form |7} can also 
be used to approximate linear operators on tensors, see also Section 2.8 below. 



In the other direction, S VD-based approximations of a function-related tensor yield an approximation of 
the underlying function, where the L 2 -norm approximation error can be directly controlled by the truncated 
singular values. For smooth functions, best approximations in tensor formats are known to inherit the reg- 
ularity of the approximated function [252 254|. This also holds for SVD based quasi-best approximations, 
for which even the smoothness of the error can be controlled |109) . This can be used, e.g., for deriving L°° 
error estimates [ |109[ or approximation results for the basis matrices | 234) . 

For quantum many-body systems, the approximability of the ground state by a low-rank TT decompo- 
sition is closely linked to the concepts of entropy and entanglement, see [22 1) for an introduction. 
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2.8 Low-rank decomposition of linear operators 

The matrix representation of a linear operator 

A . Rni xn 2 x-xn d _^ R m 1 xm 2 x-xm li) % ^ 

can be viewed as an mini x m 2 n 2 x • • • x rn^n^ tensor after pairing up row and column indices: 

Ai 1 ,i 2 ,... } i d ;j 1 J 2 ,...,j d => ■A(il,jl),(' i 2j2),---,(id,jd)- 

This view allows to apply any of the low-rank tensor decompositions discussed above to A. Such a low- 
rank decomposition of A is useful, e.g., for performing the matrix-vector product A(X) efficiently when 
X itself admits a low-rank decomposition. This idea appears to be ubiquitous in the literature on low-rank 
tensor decomposition, see ]26) for an early reference. For example, in the study of strongly correlated 



quantum systems, matrix product operators (MPO) were introduced in [262 270 1, which corresponds to 
representing A in the TT decomposition. As pointed out in | |116| , having A in a low -rank decomposition 
also allows to compute an approximate inverse by combining the Newton-Hotelling-Schulz algorithm with 
truncation, see Sectionl3.ll 



2.9 Tensorization 

Reverting the process of vectorization, the entries Xj of a vector x £ M. N , N = 2 d , can be rearranged 
into an 2 x 2 x • • • x 2 tensor X of order d. (The binary representation j — 1 = Y2u=i 2 AI ~ 1 « /J yields a 
simple mapping to the corresponding multi-index . . . , i^) of X.) In turn, a low-rank approximation of 
X yields a compression of the original vector x. In combination with the TT decomposition, this idea of 
tensorization or quantization is usually called Quantics-TT or Quantized-TT (QTT); it was first used as a 
compression scheme for matrices in [205 1 , and introduced for a more general setting in |154| . 

Quantization is particularly interesting when the vector x represents a function / : I — > K evaluated at 
2 d points, usually uniformly distributed in the interval. The exact and approximate ranks of X for various 
functions / have been discussed for the TT and HT decompositions in [ 104 108 154). 

Applying quantization to each mode of a tensor X 6 ^Nx---xN Q f or( j er _/y _ 2 d , yields a 2 x 2 x 
• • • x 2 tensor y of order d ■ D. This gives rise to a variety of mixed low-rank tensor decompositions, as 
discussed in j621[T54") . 

QTT has been applied to the solution of PDEs and eigenvalue problems [144 154, 162], evaluation of 



boundary integrals in BEM [164|, convolution 1 107| and the FFT |63) , 108, 230 1 . A connection between 



QTT and the wavelet transform is discussed in (2T5J. 

One important ingredient of QTT is that the involved matrices can be represented in a way that con- 
forms to the format [ 162 1. For the following matrices, QTT representations have been discussed: Toeplitz 



matrices [136 216], (inverse) Laplace operators [139 205], linear diffusion operators [62 138 1. 



2.10 Software 

There are several Matlab toolboxes available for dealing with tensors in CP and Tucker decomposition, 



including the Tensor Toolbox fl2] 13 1, the A- way toolbox J7J, the PLS .Toolbox 0267), and the Tensor- 
lab [ 240 1 . The TT- Toolbox [207 1 provides Matlab classes covering tensors in TT and QTT decom- 
position, as well as linear operators. There is also a Python implementation of the TT-toolbox called 
ttpy (204]. The htucker toolbox ]173| provides a MATLAB class representing a tensor in HT decom- 
position. 

The TensorCalculus library [79| is a mathematically oriented C++ library, allowing for computations 
with general tensor networks. The Heidelberg MCTDH Package [268] is a set of Fortran programs for 
multi-dimensional quantum dynamics. ALPS [ 18 1 provides C++ libraries for simulating strongly correlated 
quantum mechanical systems, including DMRG. Block | 237) is a C++ implementation of the DMRG 



algorithms discussed in ]236| . The tensor contraction engine flO) automatically generates near-optimal 
code for tensor contractions in many-body electronic structure methods. 
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3 Algorithms 

In applications for function-related tensors, X is often given implicitly, e.g., as the solution to a linear sys- 
tem or eigenvalue problem. There are mainly two different types of approaches to obtain an approximation 
to X. A first class of methods is based on combining classical iterative algorithm with repeated low-rank 
truncation. A second class is based on reformulating the problem at hand as an optimization problem, 
constraining the admissible set to low-rank tensors, and applying various optimization techniques. 



3.1 Iterative methods combined with truncation 

In principle, any vector iteration for solving a linear algebra problem involving a tensor can be combined 
with truncation in any of the low-rank decompositions discussed above. To illustrate the basic principle, 
let us consider the (preconditioned) Richardson iteration for solving a linear system A(X) = B: 

X k+1 =X k +uV(B-A(X k )), 

where V is a preconditioner and w is a suitably chosen scalar. Letting T denote truncation in any of the 
low-rank tensor decompositions discussed above, we obtain the truncated Richardson iteration 

x k+1 = r(x k + ljv(b - A(x k )) 

which has been proposed [ 165| in combination with the CP decomposition. 

Other examples of combining iterative methods with low-rank truncation include: 



• the (shift-and-invert) power method combined with CP 1 26 27 1; 



• the (shift-and-invert) power and Lanczos methods combined with CP and Tucker 1 1 1 1 14 1; 

• a restarted Lanczos method combined with TT 1 1261; 



conjugate gradient type-methods for symmetric eigenvalue problems combined with truncation for 
low-rank matrices (JM), as well as for TT (187), HT (172) and QTT (181) . 

the Richardson method combined with QTT 1 161 1 and low-rank matrix decompositions |189| ; 

a projection method combined with HT 1 16 1; 



• the conjugate gradient method, BiCGStab and other Rrylov subspace methods combined with HT 1 169 
|171| |246| and low-rank matrix decompositions (32); 

• GMRES combined with TT (59). 

When applying an iterative method in combination with a low-rank decompositions, the ranks will in- 
evitably grow quite quickly in the course of the iteration. For example, the sum of k tensors can multiply 
the ranks by k, while the pointwise (Hadamard) product between two tensors can even square the ranks. 
To gain efficiency, it is therefore advisable to not let this rank growth happen explicitly and combine such 
an operation directly with truncation. This has been discussed for sums in [ 173 246 1 and for the Hadamard 
product (and other bilinear operations) in (232), see also (206). 

Preconditioners not only help accelerate convergence but numerical evidence suggests that an effective 
preconditioner leads to a limited intermediate rank growth. At the same time, it is mandatory that the 
preconditioner can be applied efficiently to a low-rank tensor decomposition. In particular, this is the case 



when the preconditioner itself admits a low-rank decomposition in the sense of Section 2.8 There are 
various techniques to construct such preconditioners. An early technique is based on best approximation 
in the Frobenius norm by a Kronecker product [251 256], by a short sum of Kronecker products [203 1, 



or by a more general low-rank tensor decomposition. Other techniques include the use of approximate 



inverses for high-dimensional Laplace operators 1 151 165 172 210 1, low-rank manipulation of the PDE 



coefficients (60) , low-rank tensor approximation of multilevel preconditioners (8), and low-rank tensor 
diagonal preconditioners for wavelet discretizations (TT) . 
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3.2 Optimization-based algorithms 

In many cases, a linear algebra problems involving a tensor can be posed as an optimization problem. For 
example, it is well known that a symmetric positive definite linear system A(X) = B can be turned into 

min Ux,A(X))-(X,B), (8) 

where (• , •) corresponds to the standard inner product for the vectorization of the tensors. For nonsymmetric 
linear systems, an optimization problem can be obtained by minimizing the norm of the residual. For sym- 
metric eigenvalue problems, the Rayleigh-quotient minimization or, more generally, the trace minimization 
principle can be used. 

Once the optimization problem is set up, the set of admissible tensors X is then constrained to a low- 
rank decomposition, for example, to all tensors with fixed tensor rank or fixed multilinear rank. Even when 
the original optimization problem, such as (jHJ, is convex and in principle simple, the resulting constrained 
optimization problem is highly nonlinear and non-convex in general. A number of heuristic approaches 
to the solution of such constrained optimization problems are available, including ALS (alternating linear 
scheme). The basic principle of ALS is to optimize every factor of the low-rank decomposition separately 
and to sweep over all factors repeatedly. It is probably most natural to combine ALS with the CP decom- 
position, see 1 27 68 1 for an application of this idea to linear systems. The combination of ALS with the TT 



decomposition has been considered in |65-67 124|. The convergence of ALS for the TT decomposition 
has been studied in |226|. 

The ALS scheme can be improved in various ways. One quite successful improvement for decomposi- 
tions described by tensor networks is to join two neighboring factors, optimize the resulting supernode, and 
split the result into separate factors by a low-rank factorization. Originally, this so called DMRG method 
had been developed for the simulation of strongly correlated quantum lattice systems, see (235} for an 
overview. Later on, the ideas of DMRG have been picked up and extended to other applications in the 
numerical analysis community in a series of papers |65 124 127 160 171 206). 



There is a growing interest in applying so called Riemannian optimization techniques [ 1 ] to (jHJ. Ex- 
amples include nonlinear conjugate gradient or Newton-like methods on manifolds of low-rank matri- 
ces (35][T9TJ[198]|257||258) or low-rank tensors (52j|53j[T3TJ[T32j|255) , see also Section|34] For tensors in 
CP decomposition, the approximate solution of ([8]) by gradient techniques has been discussed in [78 1. 

3.3 Successive rank-1 approximation 

A tempting and surprisingly successful approach to the solution of high-dimensional problems is to build 
up a low-rank approximation from successive rank-1 approximations. This idea has been suggested in the 
context of various applications, including Fokker-Planck equations |4| and stochastic partial differential 
equations |202|. 

We will illustrate the basic idea with a simple example. Consider a linear system A(X) = B with the 
solution tensor X 6 ]jrnx — xn,^ Assume we already have a CP approximation X r of tensor rank r: 

vec(X r ) = u[ d} ® u[ d - 1} <g> •• • <g> + • • ■ + 4 d) ® ® • •• ® (9) 

We then search for a rank-1 correction 

w = 4% <g> <gi • • • ® 4+1, 

such that X r+ i = X r + W is an improved approximation, that is, 

A(X r+1 )^B & A(W) « B - A(X r ) (10) 



Analogous to Section 3.2 the unknown vectors . . . , m>W can be determined by turning ( fT0| ) into a 
nonlinear (optimization) problem and applying standard methods, such as the alternating direction method. 
This procedure is repeated until the residual A(X r ) — B is sufficiently small. Of course, such a greedy 
approach will not yield the best rank-i? approximation after R steps |241[ . However, it is important to 
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remember that these methods aim at a more moderate goal, to obtain a reasonable approximation after R 
steps, with R not too large. Convergence results in this direction can be found in [ 3 39 40 , 84-86] 179) . 



A number of improvements to the simple scheme outlined above have been proposed to increase its 
convergence speed, see, e.g., l|4]|95 200, 202). A connection between best rank-1 or, more generally, best 
rank-m approximations to a nonlinear eigenvalue problem is explained in ]201) . This connection also mo- 
tivates the use of the term generalized spectral decomposition. Many further developments, improvements, 
and extensions of successive low-rank approximation techniques have taken place during the last years; we 
refer to [43 1 for an overview. 



3.4 Low-rank methods for dynamical problems 

Let us consider a dynamical system on W 11 x --- xn a : 

X(t) = F(X(t)), X(t ) = X Q , (11) 

for which a typical application is the spatial discretization of a time-dependent d-dimensional PDE. Dy- 
namical low-rank methods aim to determine an approximation y(t) in a manifold M. of low -rank tensors 
by restricting the dynamics of ( fTT) to the tangent space Ty^M.: 

y(t)eT y{t) M such that \\y(t) - F(y(t))\\ = min! (12) 

As explained in |183) , this approximation is closely related to the can Dirac-Frenkel-McLachlan variational 
principle in quantum molecular dynamics. 

Initially proposed for low -rank matrix manifolds in [166|, dynamical low -rank methods have been ex- 
tended to low-rank tensors in Tucker |l67l|T99l , TT/MPS~ |LT^|L?3l|163||186) , and HT J9j[l86j[255) de- 
composition. The efficient and robust numerical integration of ( fT2") i is crucial to the success of dynamical 



low-rank methods; apart from the references above, this aspect has been discussed in ( 184 185 1. 

A more immediate approach to ( fTT| is to combine a standard time stepping method, such as the explicit 
and implicit Euler methods, with low-rank truncation in every time step [64]. An alternative, which allows 
to control the error global-in-time, is to apply iterative solvers to a space-time formulation [5 8 61, 64][94) . 

3.5 Black box approximation 

Suppose that a matrix A or a tensor X is defined through a function that returns entries at arbitrary positions. 
Then the goal of black box approximation is to find a good low -rank approximation based only on relatively 
few entries. It is important to emphasize that the selection of the entries can be controlled by the user. In 
this respect, this situation is quite different from the growing area of tensor completion, see, e.g., |92, 182 1, 
where the selection of the entries is usually prescribed by the application. 

For an to x n matrix A, the so called cross approximation method [19,34 100,248] produces an ap- 
proximation of the form 

A(:, J)A(I, J)- X A{I,:), (13) 

where Matlab notation is used to denote the submatrices of A corresponding to the index sets / C 
{1, . . . , to} and J C {1, . . . , n} . In the pth step of cross approximation as described in [21 , 209], the 
entry of largest magnitude in the column j p of A — A(:, J)A(I, J) _1 ^4(7, :) is calculated and its position 
is denoted by i p . Then the entry of largest magnitude in the row i p of that matrix is calculated and its 
position is denoted by j p +%. Moreover, both index sets are updated: / •<— I U {i p } and J <— J U {j p }- 
Volume maximization is an alternative entry selection strategy that attempts to maximize | det (A(I, J)) , 
see |99l|248l. 

A first extension of cross approximation to tensors was proposed in [ 209 1 for approximating third-order 
tensors by a Tucker decomposition. Essentially, this extension consists of applying the algorithm above to 
an arbitrary matricization of the tensor. However, the rows of this matricization (corresponding to slices of 
the tensor) are further approximated by a low rank matrix, again using cross approximation. In [88 , 21 1| , 



this method has been combined with multilevel ideas and applied to quantum chemistry. The adaptive 
cross approximation for the Tucker decomposition has been analyzed in [20|. Another variant for third and 
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fourth order tensors, focussing on interpolation properties, is discussed in | |197| . A cross approximation 



method for the TT decomposition has been proposed in [214 231). 

Based on fiber crosses, a quite different extension for tensors of arbitrary order can be found in (73). In 
this method a set of multi-indices I 1 , . . . , I p € [1, m] x •••[]., rid] is computed successively. Subspaces 

are constructed approximately containing all //-mode fibers passing through at least one of the multi- 
indices. The tensor is then approximated by a CP decomposition |2| under the constraint it*. G U^, using 
general optimization methods. The multi-indices are selected by performing an alternating direction search 
along the fibers of the error tensor. Also based on fiber crosses, a black box approximation for a tensor in 
the HT decomposition is given in 1 14 TT) . 



Randomized algorithms represent an alternative way to quickly extract a low-rank approximation from 
partial information on the entries of the matrix, see [ 120 1 and the references therein. These ideas have been 
extended to low -rank tensor decompositions, for the Tucker decomposition in 1 70 89 247] and for the TT 
decomposition in | |219) . 

3.6 Other algorithms 

For linear systems and eigenvalue problems with a very particular structure, it is sometime possible to 
design specialized algorithms that can be more efficient and easier to analyze. This applies, in particular, 



to discretizations of the multi-dimensional Poisson equation |22 101 170 1188[. 
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